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Abstract 

In recent years, deep learning has begun to supplant traditional machine learning algorithms in a variety of fields, 
including machine translation (MT), pattern recognition (PR), natural language processing (NLP), speech 
recognition (SR), and computer vision. Systems for optical character recognition (OCR) have recently been 
developed using deep learning techniques with great success. Within the area of pattern recognition and computer 
vision, the procedure of handwritten character recognition is still considered to be one of the most challenging. 
The height, orientation, and width of the handwritten characters do not always correspond with one another 
because different people use different writing instruments and have their own unique writing styles. This makes the 
job of handwritten recognition challenging and difficult. The regional languages of Arabic and Urdu have received 
less research. In this article, a summary and comparison of the most significant techniques of deep learning that 
are used in the recognition of Arabic-adapted scripts like Arabic and Urdu have been provided. 


Keywords: Arabic Natural Language Processing; Urdu Natural Language Processing; Optical Character Recognition; 
Handwritten Character Recognition; Deep Learning. 


1 | Introduction 


Optical Character Recognition (OCR), refers to the automated conversion of printed or handwritten text 
images captured by scanning or photography into text that can be read by machines. The technology of OCR 
has been in existence since the early days of computing and has undergone continuous enhancement through 
the evolution of machine learning and computer vision techniques [1-3]. 


© Corresponding Author: mohamed.gtisha@cis.edu.eg; m.gresha@fci.zu.edu.eg 
&) https://doi.org/10.61356/j.mawa.2024.26861 

Licensee Multicriteria Algorithms with Applications. This article is an open access article distributed under the terms 
CON conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0). 


Deep Learning Algorithms for Arabic Optical Character Recognition: A Survey 66 


The functioning of OCR technology involves analyzing the visual features of the input image, including the 
individual characters' size, shape, and placement. Classification algorithms are then employed to identify each 
character and transform it into digital text. This technology is applicable to various types of documents, such 


as receipts, books, newspapers, handwritten notes, and business cards [4]. 


The practical uses of OCR technology are extensive and include tasks like streamlining data entry tasks, 
digitizing historical documents, and enhancing accessibility for visually impaired individuals. When combined 
with other technologies, such as natural language processing (NLP) and machine translation (MT), OCR can 
facilitate advanced applications such as sentiment analysis (SA) and automated language translation (ALT) [5]. 


Although OCR technology has significantly transformed how textual information is processed and analyzed, 
it is not entirely error-free, and there are certain challenges related to its application. One of the foremost 
difficulties is managing the variations present in the input image, such as differences in font style, size, 
variations, or spacing in writing style for handwritten text. To ensure precise outcomes, OCR algorithms must 
be meticulously designed and trained to handle such variations. Nonetheless, OCR technology's impact on 
society is projected to expand in the future [6-8]. 


Deep learning has revolutionized OCR by allowing more accurate character recognition than ever before. 
With the use of deep neural networks, OCR systems can learn to recognize characters and words with high 
accuracy, even when dealing with noisy or distorted images [9, 10]. 


One of the main advantages of deep learning-based OCR systems is that they can recognize characters and 
words based on their visual features rather than relying on pre-defined patterns or templates. This means that 
they are more adaptable to new fonts, styles, and languages, which makes them mote practical for real-world 
applications [11]. 


There are two main approaches to deep learning-based OCR: fully supervised learning and semi-supervised 
or unsupervised learning. Fully supervised learning requires a large dataset of labeled images of input 
characters or words to train, while semi-supervised or unsu-pervised learning can work with fewer labeled 
images and can also utilize unlabeled data during training [2, 10]. 


Overall, deep learning has led to significant improvements in the accuracy and efficiency of OCR systems 
and has enabled a wide range of new applications for character recognition, including document digitization 
[12] and automatic text recognition in images or video surveillance [13, 14]. 


The study of recognizing handwritten characters is a field of research that concentrates on formulating 
algorithms and methodologies for the automatic identification and conversion of handwritten characters into 
machine-readable or digital text. This field has a significant role to play in various sectors, such as healthcare, 
education, and finance, which still rely heavily on handwritten documents [15]. 


With the progress in machine learning and computer vision, the field of recognizing handwritten characters 
has progressed significantly in recent years. These systems first pre-process the incoming image, then segment 
the characters into their constituent parts, and finally use classification algorithms to identify each character. 
Deep learning techniques like convolutional neural networks (CNN) have been particularly effective in 
achieving state-of-the-art performance in handwritten character recognition. However, obstacles such as the 
diversity of handwriting styles and the need for substantial amounts of annotated data still exist. Nonetheless, 
further progress in this field offers immense potential for enhancing the processing and analysis of 
handwritten documents. 


This work explores different deep learning techniques used in the recognition of Arabic-adapted scripts and 
Urdu. In this research, the studies conducted in the last 5 years were listed, taking into account the existence 
of survey papers in this field. The studies also covered various other years, providing a comprehensive 
overview. The structure of this paper is as follows: In the second section, we review previous studies that 
have used deep learning for Arabic language recognition. Section 3: We examine past research that has utilized 


67 Mahdi et al. | Multicriteria. Algo. Appl. 2 (2024) 65-79 


deep learning techniques for recognizing other languages, like Urdu. The conclusion is presented in Section 
4. 


2 | Arabic Natural Language Processing 


The Arabic language is the one that is used the most frequently among the Semitic languages. Arabic is the 
official language of a total of 26 countries, and it is estimated that 372 million people around the globe are 
able to speak the language. The Arabic language is used by Muslims around the world to read the Holy Quran, 
which is written in Arabic. This places Arabic as the sixth most common language in use today. In addition 
to this, it is one of the six languages that the United Nations (UN) recognizes as official, along with Chinese, 
English, French, Russian, and Spanish [16]. Writing in Arabic uses a cursive form of the Arabic language, and 
it is read from right to left. There are 28 different characters used in the Arabic language. The size of the 
character is not set in stone; rather, it changes depending on the shape of the character, the font used, and its 
location within the word (beginning, middle, end, or isolated), as shown in Table 2. In addition, the writing 
system of Arabic makes use of diacritical marks to symbolize short vowels and other sounds. Some examples 
of these marks or diacritics include "fatha," "dhumma," "Tanween," and "kasra,” as shown in Table 3. In 
addition, Arabic has a large number of ligatures, which are created by combining two or more characters, 
such as the combination "alif-laam.” It's possible for a single character to have any number of dots, from one 
to three, as well as unique characters like "Hamza,” as shown in Table 1. It's possible for a single term in 
English to have multiple meanings when translated into Arabic, as is the case with the words “good” and 
"love.” In addition, other languages, such as Farsi, Kurdish, Urdu, and Pashto, borrow words, characteristics, 
and structural elements from Arabic. In light of the foregoing, this constitutes a significant obstacle for 
researchers working in the field of Arabic language comprehension [17-19]. 


In [20], the authors introduce Hijja, a new dataset of Arabic letters written by children aged 7-12, and propose 
an automatic handwriting recognition model based on CNN. They train their model on both Hijja and the 
Arabic Handwritten Character Dataset (AHCD) and achieve promising results, with accuracies of 97% and 
88% on the AHCD and Hijja datasets, respectively. The authors also find that their model outperforms other 
models in the literature on all metrics. However, they note that the Hijja dataset is more challenging than the 
AHCD dataset, as their model performs worse on it. Overall, this paper demonstrates the potential of using 
convolutional neural networks for recognizing Arabic handwriting. 


In [21], authors present a new deep learning system called AHCR-DLS for recognizing Arabic handwritten 
characters, aiming to address the challenges associated with this task. The AHCR-DLS system achieves high 
accuracy rates on the train dataset, with an average accuracy of 97.3% for HMB1 and 96.8% for HMB2, and 
on the test dataset, with an average accuracy of 95.5% for HMB1 and 94.9% for HMB2, demonstrating the 
effectiveness of the proposed approach. 


In [22], the authors propose a novel approach for classifying handwritten Arabic characters using a CNN and 
an optimized leaky ReLU activation function. The results showed that the proposed approach achieved high 
accuracy, outperforming both ReLU (97.8%) and leaky ReLU (97.9%) with the four datasets used in this 
study. Overall, this paper demonstrates that using deep learning techniques can significantly improve the 
performance of handwritten Arabic character recognition systems compared to traditional methods or other 
activation functions like ReLU or leaky ReLU. 


Elkhayati and Elkettani [23] introduce a new directed CNN model (UnCNN) for recognizing isolated Arabic 
handwritten characters. The paper reports the effectiveness of the approach by comparing it with BsCNN 
and other recent models. The proposed UnCNN model is evaluated using four benchmark databases: 
IFHCDB, AHCD, AIA9K, and HACDB. Compared to recent models in the literature, UnCNN achieves 


competitive performance with some models and outperforms others. 


In [24], Alghyaline presents a novel approach for recognizing printed Arabic characters using deep learning. 
The approach called Printed Arabic Optical Character Recognition (PAOCR) is based on the state-of-the-art 
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You Only Look Once (YOLO) object detector and consists of four techniques. The first technique involves 
customizing and training YOLO4 on deep CNNs to recognize Arabic characters. The second technique 
involves processing overlapped bounding boxes to ensure the most accurate box is selected for each character. 
The third technique uses the Hunspell library to check word spelling and correct any errors. Finally, the fourth 
technique uses edit distance to compare OCR misspelled words with Hunspell's suggestions and choose the 
closest correct word. The proposed PAOCR system achieved an impressive accuracy rate of 82.4% on a 
dataset of printed Arabic characters. 


Table 1. Hamza and its combination with other letters in Arabic. 


Name shape Example Pronunciation Meaning In English 
Hamza on an Alef l eile hamza-adam Adam 
Hamza above an Alef j cng abayn parents 
Hamza below an Alef l aval al ibrahim Ibrahim 
Hamza on top of Ya ts Cats sa'imtu I got tired 
Hamza under Ya es AR muta'illifun a novelist 
F " "oon 
Hamza on ~ Ya with É ig fa'atun a oe o a 
Hamza on top of Ya with A ES n 
Waw si tS gia ma'wiyyun square 
Hamza on top of Waw 4 Cy gis ha mu'minoon believers 


Table 2. The pronunciation of each Arabic letter in English and Arabic and its location within the word (beginning, 
middle, end, or isolated). 


No. Pronunciation Pronunciation insane Position 

in English in Arabic Beginning Middle End 
1 Alef Call | -| ae E 
2 Baa eh = = at ao 
3 Taa sli a 4 Ea eae 
4 Thaa sÜ È à B Èr 
5 Jim a id => > © 
6 Ha sla fd a peel e 
7 Kha sla fd a à & 
8 Dal dla a =ð X à 
9 Thal Jha à að a x 
10 Ra el) J zj p > 
11 Zayn W 5 4 as p 
12 Seen om ve au ai oO 
13 Sheen o 8 a ak ob 
14 Sad alia ua 55 pea ga 
15 Dad dhe va =a ae ua 
16 Taa ahl L + L L 
17 Zaa L +4 L L 
18 'Ayn One E c œ & 
19 Ghayn Ost & r a È 
20 Fa slå a 4 à a 
21 Qaf ali 3 4 a & 
22 Kaf US J < $ al 
23 Lam ay J Ki Be d 
24 Meem an e ~ oa a 
25 Noon os Ù a Ee se 
26 Ha cla > a + a 
27 Waw als 3 -3 -5 s 
28 Ya el GS 3 ma F 
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Table 3. Arabic diacritics with examples and pronunciations. 
Meaning In 


Name Diacritics Example Pronunciation English 
Fatha my s i Kataba he wrote 
Kasra my AS kitabin a book 
4 + we 
Damma 3 CaS kutiba it was written 
° a 
Sukun My iS k-t-b Books 
os ` 
Shadda 3 US Kittab writer 
Fathatan my 6S Kitaban a book 
Tanween Kasratan my LS Kitabin a book 
ae 
Dammatan Oy Cols kitabun a book 


2.1 | CNN Hybrid with Other Algorithms 


In [25], this study introduces a novel model for single font and multi-font types using support vector machines 
(SVM) and CNN classifiers. The model incorporates dropout to address overfitting and performs automatic 
classification and feature extraction. The authors propose a depth neural network training rule and combine 
max-margin minimum classification error (M3CE) and cross-entropy approaches for improved results. The 
model is evaluated on several databases and compared to state-of-the-art Arabic text recognition methods, 


demonstrating favorable outcomes. 


In [26], researchers discuss the use of deep learning and genetic algorithms for recognizing handwritten Arabic 
characters. This study aims to automate Arabic content recognition, which is still in its early stages compared 
to Latin and Chinese content recognition. This study focuses on text processing, particularly text 
segmentation and recognition. It discusses the challenges of text segmentation and proposes solutions for 
each challenge. The recognition phase utilizes a CNN to improve classification algorithms by automatically 
extracting features from images. The study evaluates 14 different CNN architectures and achieves a maximum 
testing accuracy of 91.96% on handwritten Arabic characters. To optimize training parameters, a transfer 
learning and genetic algorithm approach called "HMB-AHCR-DLGA" is introduced, achieving a testing 
accuracy of 92.88% through five optimization experiments. 


In [27], researchers compare the performance of Bayesian and CNNs for recognizing Arabic handwritten 
words. The study reports that the CNN-based system outperforms the Bayesian-based system, achieving an 
accuracy rate of 96.8% compared to 91.5%. The paper describes the proposed system based on probabilistic 
graphical models (PGM) and CNN models and provides details on the experiments conducted to compare 
the performance of these models on the IFN-ENIT database. Overall, this paper provides valuable insights 
into the effectiveness of different machine learning methods for Arabic handwriting recognition. 


2.2 | Long Short-Term Memory (LSTM) 


In [28], the performance of 1D and 2D LSTM architectures for recognizing handwritten Arabic is compared. 
The authors demonstrate that using a simple pre-processing step to normalize the position and baseline of 
letters, 1D LSTM can achieve superior performance while being faster in learning and convergence. The 
authors report achieving an accuracy of 91.5% on the IFN/ENIT database using their proposed pipeline, 
which outperforms both manually crafted features with 1D LSTM (87.4%) and 2D LSTM networks (72.9%). 
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The authors also compare their results with previous work that concluded the better performance of manually 
crafted over automatically learned features (using 1D LSTM) for Arabic handwriting recognition and 
demonstrate the superior performance of their proposed system over manually crafted features for this task. 


2.3 | LSTM Hybrid with other Algorithms 


In [13], the authors present a deep learning-based system for recognizing Arabic script that has been 
benchmarked on the KHATT dataset. The deep learning model is based on an MDLSTM architecture with 
a CTC layer for alignment, and data augmentation techniques are introduced to enrich the input feature space. 
The system achieves an accuracy of 80% on the KHATT dataset, representing a significant improvement 
over the previous method. The authors attribute this improvement to the use of deep learning and data 
augmentation techniques. 


In [29], a new approach uses an LSTM network with the aid of an Elephant Herding Optimization (EHO) 
algorithm. Hybrid feature descriptors, such as ELBP and IDMN, are employed to extract feature values from 
segmented individual characters. The EHO algorithm is then utilized to optimize the dimension of these 
features, reducing overfitting concerns and improving the training and testing mechanisms of the classifier. 
The optimized features are then inputted into the LSTM network for character classification. The simulation 
results show that the proposed EHO-LSTM model achieves high accuracy rates of 96.66%, 96.67%, and 
99.93% for English, Kannada, and Arabic character recognition on the chars74K and MADbase digit datasets. 


2.4 | Other Techniques 


In [30], this research paper presents a model for Arabic handwriting recognition that combines the ResNet50 
architecture with either SVM or Random Forest (RF) algorithms. The study finds that combining ResNet50 
with Random Forest produces more accurate and consistent results compared to using ResNet50 alone. The 
experimental work conducted in this study involves three datasets: the Arabic Handwritten Character Dataset 
(AHCD), the Alexa Isolated Alphabet Dataset (AIA9K), and the Hijja Dataset. The modified ResNet50 
architecture achieves recognition rates of 92.37%, 98.39%, and 91.64% for three datasets, while the combined 
architecture achieves recognition rates of 95%, 99%, and 92.4% for the same datasets, respectively. 


In [31], this study proposes the use of residual neural networks (ResNets) for recognizing Arabic offline 
isolated handwritten characters. The methodology consists of pre-processing, training the ResNet, and testing 
it on datasets. The approach achieves high accuracy on three datasets (MADBase, AIA9K, and AHCD) and 
validation accuracy on a combined dataset. 


In [32], authors present a comprehensive handwritten Arabic-Maghrebi character database and propose a new 
approach for recognizing them using a deep auto-encoder scheme. The database includes all basic Arabic 
alphabets written in the Maghrebi style, which has not been previously studied. The proposed approach 
achieves a recognition rate of 88% using a backpropagation neural network with the CENPRMI dataset and 
76.54% using a combination of multiple HMM classifiers with the IFN/ENIT dataset. 


The authors of [33] suggest a two-step method for detecting and recovering out-of-vocabulary words in 
Arabic handwritten text recognition. The effectiveness of this approach is compared to one-step methods 
that use a large static lexicon or a combination of sub-word modeling methods. The results presented in this 
article show that the proposed two-step approach outperforms the other methods, achieving an accuracy rate 
of 91.5% for out-of-vocabulary word detection and 87.3% for out-of-vocabulary word recovery. This paper 
provides valuable insights into improving Arabic handwriting recognition systems by addressing the challenge 
of out-of-vocabulary words. 
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Table 4. Summary of some deep learning applied on Arabic language. 


Reference Approach Dataset Accuracy 
AHCD 97% 
Altwaijry et al. 2021[20] CNN F 
Hijja 88% 
chars74K 96.66% 
Guptha et al. 2022[29] EHO-LSTM 
MADbase 99.93% 
Ahmad et al. 2020[34] MDLSTM KHATT 80% 
1D LSTM 87.4% 
Yousefi et al. 2015[28] IFN/ENIT 
2D LSTM 72.9% 
Alghyaline 2022[24] PAOCR APTI 82.4% 
Balaha et al. 2021[26] CNN and genetic algorithms HMBD 92.88% 
AHCD 99% 
f et al. 2022[22 C leak: gall 7 
N tal. NN+ ReLU 
aia S AIA9K 94 
D.MNIST 99 
Khémiti et al. 2019[27] PGM+CNN IFN/ENIT 96.8 
AIA9K 96.1 
IFHCDB 98.4 
Elkhayati et al. 2022[23] UnCNN 
AHCD 98.7 
HACDB 95.4 
. HAMCDB 98.63 
Djaghbellou et al. 2022[32] deep auto-encoder 
AHCD 98.95 


2.5 | Arabic Datasets 


2.5.1 | Printed Arabic 


The APTI dataset! was created to evaluate OCR performance on printed Arabic [35]. This extensive dataset 
comprises 45,313,600 images, each containing a single Arabic word with approximately 250 million Arabic 
characters. The dataset is synthetic, generated from a distinct set of 113,284 words, and features ten font 
types, ten font sizes ranging from 6 pt to 24 pt, and four font styles. It is split into five sets, with the fifth set 
reserved for testing and the others for training. The first four sets were publicly released by the author. A 
sample of APTI dataset is shown in Figure 1. 


The MMAC dataset? is a collection of printed Arabic text that was created in 2010[36]. It comprises a 
significant number of unique words and Printed Arabic Word Images (PAWS), which are individual images 
of the words. It comprises of 282,593 unique words and 66,725 PAWS. The data was gathered from a range 
of sources, including old books, Arabic research, and the Holy Quran. A notable feature of the MMAC dataset 
is that the images have been subjected to skewing and noise addition to triple their quantity. This makes the 
dataset more inclusive and reflective of the variations of images that one might encounter in practical 
applications. Sample images from the MMAC dataset can be seen in Figure 2. 


The KAFD dataset? was created through a collaboration between King Fahd University and Qassim 
University [37]. The dataset comprises 15,068 printed text images and 2,576,024 lines. The images have 
varying resolutions of 100 dpi, 200 dpi, 300 dpi, and 600 dpi, and feature four different Arabic fonts, ten font 
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sizes ranging from 8 to 24 points, and four font styles: Normal, Bold, Italic, and Bold Italic. The dataset is 
divided into training, testing, and validation sets and is a valuable resource for researchers and developers 
working on OCR and Arabic language processing. 


Ass yd es laldd usblys intl soll low we ana | 

An je GLAS) alla tals Gall Jas À andi: 

Awe SielSS obo Dael st fae 8 Pai: 
BI MW GEE CAN) VIE CS: 
aeelladebh cb rod! le 3 8 : 

Faye GLAST alla sali hal Jae À aaa: 

auc SLID wll öacl ocal lia 99 poi ; 


dy 6 SLISS oll, dae cai lis 4 ele | 


TOnm mI aw 


ie oollha Bia 30s 1 


õpe öls SLL, Gacls Gal Jom we paï : J 


Figure 1. Fonts used to generate the APTI Database: (A) Andalus, (B) Arabic Transparent, (C) Advertising Bold, (D) 
Diwani Letter, (E) Deco Type Thuluth, (F) Simplified Arabic, (G) Tahoma, (H) Traditional Aatbic, (1) Deco Type 
Naskh, (J) M Unicode Sara. 


pe- 15l J 
Ua NS ee 
YW LSI = — 


Figure 2. A representative image from the MMAC dataset. 


The Yarmouk Arabic OCR dataset! is a collection of printed Arabic text images [38]. The dataset comprises 
8,994 images with a resolution of 300 dpi, and the images contain 436,921 words extracted from the Wikipedia 
website. 


The APTID/MF dataset is a collection of printed Arabic text images that was developed in 2013 [39]. The 
dataset consists of 1,845 images with a resolution of 300 dpi, containing 27,402 characters. The images were 
taken from 387 pages of Arabic documents and include ten different font types, two font styles (normal and 
bold), and four font sizes (12 pt, 14 pt, 16 pt, and 18 pt). 


The ARABASE dataset is a collection of printed and handwritten Arabic text images|40]. The handwritten 
images were written by over 400 writers, mostly from Tunisia, while the printed text was obtained from daily 
newspapers and a book published by the Tunisian national library on the internet. The images have resolutions 
ranging from 200 dpi to 600 dpi. 


2.5.2 | Arabic Handwritten 


The HACDB dataset? is a collection of Arabic handwritten characters developed by a team of 
researchers [41]. It consists of 6,600 characters, with each of the 50 writers producing 132 shapes of characters. 
The shapes depict the different ways Arabic characters can be written in various positions within a word, such 
as the beginning, middle, end, and when they stand alone. This dataset is highly valuable for researchers and 


‘https://drive.google.com/drive/folders/0B4Kx3iMuktgsdC12Ui1 neklInMzQ?resourcekey=0dX3YKFT4xArR1rT81wQ2wSw 
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developers working on Arabic character recognition and OCR tasks. It has been used in several studies related 
to Arabic handwriting recognition and is particularly advantageous in evaluating machine learning algorithms' 
performance. One of the HACDB dataset's outstanding features is its inclusion of various handwriting styles, 
as it was collected from a diverse group of writers. This aspect makes it more reflective of the variations in 
Arabic handwriting that can be encountered in real-world scenarios. Sample images from the HACDB dataset 


can be seen in Figure 3. 


The Hijja dataset! is a collection of handwritten Arabic letters that were obtained from school children 
between the ages of 7 and 12 who speak Arabic [20]. The data was collected in Riyadh, Saudi Arabia, from 
January to April 2019, and comprises a total of 47,434 characters written by 591 participants in various forms. 
Examples of images from the Hijja dataset are available for viewing in Figure 4. 


Figure 3. A representative image from the Figure 4. Sample picture from the Hijja dataset. 
HACDB dataset. 


The KHATT dataset? is a collection of handwritten text [42]. It includes one thousand forms written by one 
thousand distinct writers and consists of two thousand paragraphs with 9,327 lines sourced from forty-six 
different sources. The dataset is available in three resolutions, which are 200 dpi, 300 dpi, and 600 dpi, and is 
partitioned into 70% for training, 15% for testing, and 15% for validation purposes. The KHATT dataset is 
a valuable resource for researchers and developers working on OCR applications and recognizing handwritten 
text. It has been used in several studies related to handwritten text recognition and OCR. 


The IFN/ENIT database? is a dataset of handwritten text images [43]. It contains 2,200 images of Tunisian 
cities with a resolution of 300 dpi, comprising 26,459 words and 212,211 Arabic characters. Considered one 
of the earliest handwritten text datasets available, the database has been cited 640 times, making it the most 
widely referenced Arabic OCR database. The IFN/ENIT dataset is a valuable resource for researchers and 
developers working on OCR applications, particularly for Arabic handwriting recognition. It has been used 
in various research studies related to OCR and handwriting recognition. 


3 | Urdu Natural Language Processing 


Urdu, the fifth most spoken language in the world, serves 4.7 percent of the global population and is widely 
spoken in Pakistan as the national language and in India as one of the 22 official languages [44]. Urdu language 
speakers are dispersed across 20 different countries, including India, Pakistan, Turkey, Saudi Arabia, 
Bangladesh, Afghanistan, Iran, Azerbaijan, and Nepal [45]. The Urdu language is a composition of various 
languages such as Arabic, Persian, Turkish, and Hindi, and is non-scripting [46]. Due to its cursive nature and 
heavy influence from Arabic and Persian scripts, it has no concept of capital or small letters, and shares 


similarities at the writing level. 


Urdu is known as a bidirectional language because it combines two writing systems: the Urdu script, which is 
written from right to left, and numerals, which are written from left to right. The Urdu script consists of 38 


' https://github.com/israksu/Hijja2 
2 http://khatt.ideas2serve.net/ KHATTDownload.php 


3 http://www. ifnenit.com/ 
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basic letters and 10 numeric letters, as depicted in Figure 5. Additionally, this character set is a superset of 
other Urdu-based scripts, such as Arabic, which has 28 characters, and Persian, which has 32 characters [47]. 
Each character in the Urdu language has multiple forms or shapes (isolated, initial, medial, and final) 
depending on its position within a word. For instance, characters such as Beh (<4), Peh (), and Teh (5) can 
join with other characters from both directions, i.e., from the preceding and subsequent characters within the 
same word. However, characters such as Alif (!), Daal (9), and Reh (J) only join from right to left, i.e., with 
the preceding alphabet in a word. Additionally, there are some characters that have no joining ability at all, 
such as Hamza (¢). Based on these joining properties, Urdu characters can be broadly divided into two 
categories: joiners and non-joiners. Joiners can acquire all four shapes depending on the neighboring 
character, whereas non-joiners can only acquire the isolated and final shape. 


se ur 


zvad svad 


C_[a]; 
[O/2/ea] 


Figure 5. Urdu character set with phonemes and numerals with their Roman equivalences. 


4 | Previous Studies in Urdu 


In [48], a deep neural network-based system for recognizing handwritten Urdu characters is introduced. The 
system was trained on a dataset of 74,285 samples and tested on a dataset of 21,223 samples. The proposed 
system achieved an impressive recognition rate of 98.82% for 133 classes, surpassing the performance of all 
existing state-of-the-art systems for Urdu language recognition. Moreover, the system's effectiveness was 
tested on datasets containing numeral characters from five different languages. The combined multilanguage 
database achieved an average recognition accuracy of 99.26%, with a precision of 99.29%. The recognition 
accuracy for each individual language was 99.322%. 


In [49], researchers present a hybrid approach for recognizing cursive Urdu Nastaliq script using 
convolutional and recursive neural networks. The proposed methodology combines CNN and MDLSTM for 
character recognition. The CNN extracts low-level, translational in-variant features, which are then forwarded 
to MDLSTM for contextual feature extraction and learning. The experiments were conducted on the publicly 
available Urdu Printed Text-Line Image (UPTI) dataset using the proposed hierarchical combination of CNN 
and MDLSTM. The system achieved an accuracy of up to 98.12% for 44 classes, outperforming state-of-the- 
art results on the UPTI dataset. Therefore, this approach shows promising results in recognizing 
contemporary Urdu Nastaliq script with high accuracy without extracting traditional features. 


In [50], the authors propose an encoder-decoder-based hybrid deep learning approach with a CNN for the 
feature extraction part, a bi-directional gated recurrent unit network (BiGRU) as the encoder, and a gated 
recurrent unit network (GRU) as the decoder to recognize printed Urdu script in Nastaleeq font. The dataset 
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was divided into three parts: a training dataset for model building (50%), a validation dataset for 
hyperparameter tuning (30%), and a test dataset for testing accuracy (20%). Two types of experiments were 
performed on Urdu script to evaluate the proposed method. The first experiment was related to the variations 
of shapes based on the position of the character, i.e., beginning, ending, middle, and isolation, which together 
made 191 unique categories. In the second experiment type, only 99 basic categories were used. The proposed 
model achieved an accuracy of 98.5% on the test dataset. 


In [51], a Conv-transformer architecture is proposed for recognizing unconstrained off-line Urdu 
handwriting. The model first uses a CNN, followed by a vanilla full transformer, and is trained on both 
printed and handwritten Urdu text lines. The convolution layers help to reduce the spatial resolutions and 
address the complexity of transformer multi-head attention layers, while the printed text images in the training 
phase assist the model in learning more ligatures and improving the language model. The proposed model 
achieves state-of-the-art accuracy with a CER of 5%, but the authors note that the limited amount of available 
training data is a significant challenge. They suggest that a full transformer without convolution layers could 
be used if there were more data, but with the current limited data, the convolution layers are crucial for 
achieving the best results. Moreover, although the proposed model achieves state-of-the-art accuracy for Urdu 
handwriting recognition, its performance on other languages with similar scripts or other types of handwriting 
recognition tasks is uncertain. 


In [52], the authors propose a new approach for recognizing Urdu Nasta'liq text using implicit segmentation 
based on multi-dimensional LSTM neural networks. The proposed system is compared with state-of-the-art 
Urdu text line recognition systems and is found to provide significant improvement in results. Specifically, 
the authors achieved a recognition accuracy of 98+0.25%. 


In [53], researchers present an implicit segmentation-based recognition system for Urdu text lines in Nastaliq 
sctipt using a multi-dimensional LSTM recurrent neural network (MDLSTM RNN) with a connectionist 
temporal classification (CTC) output layer that labels the character sequences. The proposed technique 
involves sliding overlapped windows on lines of text and extracting a set of statistical features. These extracted 
features are then fed to the MDLSTM RNN, which achieves a promising recognition rate of 94.5% on the 
standard Urdu Printed Text-Line Image (UPTI) database. 


In [54], a multi-dimensional recurrent neural network and statistical features are utilized to recognize Urdu 
Nasta'liq text. The proposed methodology involves three stages: pre-processing and feature extraction, 
MDLSTM, and the CTC output layer. The system is tested on the UPTI dataset and achieves an accuracy of 
91.5%, demonstrating the effectiveness of the approach in recognizing the complex Nasta'liq script. However, 
the paper acknowledges a few limitations of the proposed system. Firstly, the system does not account for 
shape variation, which can affect recognition accuracy. Secondly, the dataset used in the experiment is limited 
to UPTI and may not be representative of all types of Urdu Nasta'liq text. Finally, the paper does not compare 
the proposed system with other state-of-the-art systems for Urdu Nasta'liq text recognition. 


5 | Conclusion 


An accurate and fast OCR system for Arabic can benefit people in Arab and Muslim communities. However, 
most current OCR approaches operate offline and cannot recognize Arabic text in real-time. Arabic character 
recognition remains challenging for several reasons, and research in this area is ongoing to improve existing 
systems. Many approaches are limited to private datasets or recognizing words or paragraphs, making it 
difficult to assess their performance on real-world Arabic text. 


This study aimed to thoroughly review previous research on Arabic OCR and identify key trends, issues, and 
challenges. A systematic analysis was conducted to evaluate deep learning techniques for feature extraction 
based on recognition performance. The findings suggest that hybrid feature extraction and classification 
methods are needed to achieve optimal Arabic OCR accuracy. 
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Over the last decade, CNNs have been widely applied to Arabic OCR, demonstrating substantial gains over 
handcrafted approaches. This paper presents a comprehensive review of recent advances in Arabic OCR, 
including the characteristics of Arabic, different OCR systems, and research contributions. It also compares 
existing Arabic and Urdu OCR methods, serving as a primer for researchers interested in Arabic script 


recognition. 


Future work entails customizing and training well-known CNNs like GoogLeNet, VGGNet, AlexNet, and 
DenseNet to develop a new Arabic OCR system. Using large datasets to train CNN models can improve 
recognition rates, while combining recurrent neural networks (RNNs) with CNNs enables recognizing 
handwritten and printed Arabic text. Finally, evaluating the proposed approach on a standard benchmark can 
measure actual Arabic OCR system performance. 


In summary, this study conducted a systematic review of deep learning techniques for Arabic OCR and 
identified avenues for enhancing their recognition capability through hybrid models and customized CNN 
training on large datasets. Evaluating proposed approaches on a standard benchmark was also highlighted as 
a key step toward measuring real-world performance. The rephrased summary outlines these key objectives 
and findings in a concise yet comprehensive manner. 


The previously mentioned studies lack a temporal or sequential order in their work or performance 
improvement, and there is no researcher who has built their work based on other research. On the contrary, 
some of the studies have yielded results that are almost unreliable and inaccurate. Therefore, we will apply 
some of these studies and compare them, providing links to scientific codes that validate the accuracy of the 
results we will refer to in our future work. 


In summary, the key objectives and proposals of this study were: 
- Critically analyze existing research on Arabic OCR and identify major trends and chal-lenges. 


- Systematically evaluate deep learning techniques for feature extraction based on recognition 
performance. 


- Propose that hybrid approaches may be required to improve Arabic OCR accuracy. 
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